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. It is natural to ask: what kinds of matrices satisfy the Restricted Eigenvalue (RE) condition? In this 

' paper, we associate the RE condition (Bickel-Ritov-Tsybakov 09) with the complexity of a subset of the 

sphere in R^, where p is the dimensionality of the data, and show that a class of random matrices with 
, independent rows, but not necessarily independent columns, satisfy the RE condition, when the sample 

size is above a certain lower bound. Here we explicitly introduce an additional covariance structure to 
■ the class of random matrices that we have known by now that satisfy the Restricted Isometry Property 

, as defined in Candes and Tao 05 (and hence the RE condition), in order to compose a broader class 

of random matrices for which the RE condition holds. In this case, tools from geometric functional 
analysis in characterizing the intrinsic low-dimensional structures associated with the RE condition has 
\ been crucial in analyzing the sample complexity and understanding its statistical implications for high 

J> ' dimensional data. 

in 
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O ■ 1 Introduction 

> 

^ I In a typical high dimensional setting, the number of variables p is much larger than the number of ob- 

I servations n. This challenging setting appears in linear regression, signal recovery, covariance selection 

in graphical modeling, and sparse approximations. In this paper, we consider recovering /3 G in the 
following hnear model: 

Y = Xp + e, (1.1) 

where X is an ?i x p design matrix, y is a vector of noisy observations and e being the noise term. The 
design matrix is treated as either fixed or random. We assume throughout this paper that p > n (i.e. high- 
dimensional) and e ~ A^(0, a'^In)- Thi^oughout this paper, we assume that the columns of X have £2 norms 
in the order of ^/n, which holds with an overwhelming probability when X is a random design that we shall 
consider. 
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The restricted eigenvalue (RE) conditions as formalized by Bickel et al. (2009) ^ ai^e among the weakest 
and hence the most general conditions in literature imposed on the Gram matrix in order to guarantee nice 
statistical properties for the Lasso and the Dantzig selector; for example, under this condition, they derived 
bounds on £2 prediction loss and on ip, where 1 < p < 2, loss for estimating the parameters for both the 
Lasso and the Dantzig selector in both linear regression and nonparametric regression models. From now 
on, we refer to their conditions in general as the RE condition. Before we elaborate upon the RE condition, 
we need some notation and some more definitions to put this condition in perspective. 

Consider the linear regression model in (1. 1). For a chosen penalization parameter A„ > 0, regularized esti- 
mation with the £i-norm penalty, also known as the Lasso (Tibshirani, 1996) or the Basis Pursuit (Chen et al., 
1998) refers to the following convex optimization problem 

^=argmin^||y-X/3||i + A„||/?||i, (1.2) 
13 In 

where the scaling factor l/(2n) is chosen by convenience. 

The Dantzig selector (Candes and Tao, 2007), for a given A„ > 0, is defined as 



{DS) arg mill 



subject to 



n 



< A„. (1.3) 



For an integer 1 < s < p/2, we refer to a vector (3 W with at most s non-zero entries as an s-sparse 
vector. Let S M'-'^I, be a subvector of (3 £W confined to T. One of the common properties of the Lasso 
and the Dantzig selector is: for an appropriately chosen A„, for a vector v := (5 — j3, where (3 is an s-sparse 
vector and (3 is the solution from either the Lasso or the Dantzig selector, it holds with high probability (cf. 
Section C) 

\\viA\i ^ ^0 , (1-4) 

where I C {1, . . . , p}, \I\ < s is the support of /?, /cq = 1 for the Dantzig selector, and for the Lasso it holds 
for /co = 3; see Bickel et al. (2009) and Candes and Tao (2007) in case columns of X have £2 norm ^/n. We 
use vtq to always represent the subvector of f e confined to Tq, which corresponds to the locations of 
the s largest coefficients of v in absolute values: then (1.4) implies that (see Proposition 1.4) 

We are now ready to introduce the Restricted Eigenvalue assumption that is formalized in Bickel et al. 
(2009). In Section 3, we show the convergence rate on for p = 1,2 for both the Lasso and the Dantzig 
selector under this condition for the purpose of completeness. 

Assumption 1.1. (Restricted Eigenvalue assumption RE{s, kQ,X) (Bickel et al., 2009)) For some inte- 
ger 1 < s <p and a positive number ko, the following holds: 

mm mm — > 0. (To) 



K{s,ko,X) JoC{i,...,p} ^1^0, V^I|^^Joll2 



' We note the authors have defined two such conditions, for which we show are equivalent except on the constant defined within 
each definition; see Proposition A. I and Proposition A.2 in Section A. 3 for details. 
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Definition 1.1. Throughout this paper, we say that a vector v ^MP is admissible to (1.6) , or equivalently 
to (1.12), /or a given kQ > as defined therein, ifv^O and for some Jq S {1, . . . ,p} such that \ Jq\ < s, 
it holds that 1 1 1; jc 1 1 _^ < ko\\ vjg 1 1 j^. Now it is clear that if v is admissible to (1.6) , or equivalently to (1.12), 
(1.5) holds (cf. Proposition 1.4). 

If RE{s, ko,X) is satisfied with ko > 1, then the square submatrices of size < 2s of X'^X /nitre necessarily 
positive definite (see Bickel et al. (2009)). We note the "universality" of this condition as it is not tailored to 
any particular set Jq. We also note that given such a universality condition, it is sufficient to check if for all 
u / that is admissible to (1.6) and for K{s, ko,X) > 0, the following inequality 

> J , > (1.7) 



- K{s,ko,X) 

holds, where Tq con^esponds to locations of the s lai^gest coefficients of v in absolute values, as (1.7) is both 
necessary and also sufficient to guarantee that (1.6) holds; See Proposition 1.4 for details. 

A special class of design matrices that satisfy the RE condition are the random design matrices. This is 
shown in a large body of work in the high dimensional setting, for example (Candes et al., 2006; Candes and Tao, 
2005, 2007; Baraniuk et al., 2008; Mendelson et al., 2008; Adamczak et al., 2009), which shows that a uni- 
form uncertainty principle (UUP, a condition that is stronger than the RE condition, see Bickel et al. (2009)) 
holds for "generic" or random design matrices for very significant values of s; roughly speaking, UUP holds 
when the 2s-restricted isometry constant 92s is small, which we now define. Let Xt, where T C {1, . . . , p} 
be the n x |r| submatrix obtained by extracting columns of X indexed by T. 

Definition 1.2. (Candes and Tao, 2005) For each integer s = 1,2,..., the s-restricted isometry constant 9s 
ofX is the smallest quantity such that 

(1 - 9s) \\c\\l < \\Xtc\\1 /n < (1 + 9s) \\c\\l , (1.8) 

for all T C {1, . . . ,p} with \T\ < s and coefficients sequences {cj)j^T- 

It is well known that for a random matrix the UUP holds for s = 0{n/ \og{p/n)) with i.i.d. Gaussian 
random variables (that is, Gaussian random ensemble, subject to normalizations of columns), the Bernoulli, 
and in general the subgaussian ensembles (Baraniuk et al., 2008; Mendelson et al., 2008) (cf. Theorem 2.5). 
Recently, it is shown (Adamczak et al., 2009) that UUP holds for s = 0{n/\o^{p/n)) when X is a random 
matrix composed of columns that are independent isotropic vectors with log-concave densities. Hence this 
setup only requires 0(log(p/n)) or G(log^(p/n)) observations per nonzero value in /?, where hides a very 
small constant, when n is a nonnegligible fraction of p, in order to perform accurate statistical estimation; 
we call this level of sparsity as the linear sparsity. 

The main purpose of this paper is to extend the family of random matrices from the i.i.d. subgaussian en- 
semble ^ (cf. (1.10)), which are now well known to satisfy the UUP condition and hence the RE condition 
under linear sparsity, to a lai^ger family of random matrices X := ^'S^/^, where S is assumed to behave 
sufficiently nicely in the sense that it satisfies certain restricted eigenvalue conditions to be defined in Sec- 
tion 1.1. Thus we have explicitly introduced the additional covariance structure S to the columns of ^ in 
generating X. In Theorem 1.6, we show that X satisfies the RE condition with overwhelming probability 
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once we have n > Cs \og{cp/s), where c is an absolute constant and C depends on the restricted eigenval- 
ues of S (cf. (1.19)), when S satisfies the restricted eigenvalue assumption to be specified in Section 1.1. We 
believe such results can be extended to other cases: for example, when X is the composition of a random 
Fourier ensemble, or randomly sampled rows of orthonormal matrices, see for example Candes and Tao 
(2006, 2007). 

Finally, we show rate of convergence results for the Lasso and the Dantzig selector given such random 
matrices. Although such results are almost entirely known, we provide a complete analysis for a self- 
contained presentation. Given these rates of convergence (cf. Theorem 3.1 and Theorem 3.2), one can 
exploit thresholding algorithms to adjust the bias and get rid of excessive variables selected by an initial 
estimator relying on li regularized minimization functions, for example, the Lasso or the Dantzig selector; 
under the UUP or the RE type of conditions, such procedures are shown to select a sparse model, which 
contains the set of variables in () that are significant in their absolute values; in addition, one can then 
conduct an ordinary least squares regression on such a sparse model to obtain a final estimator, whose bias 
is significantly reduced compared to the initial estimators. Such algorithms are proposed and analyzed in a 
series of papers, for example Candes and Tao (2007); Meinshausen and Yu (2009); Wasserman and Roeder 
(2009); Zhou (2009). 



1.1 Restricted eigenvalue assumption for a random design 

We will define the family of random matrices that we consider and the restricted eigenvalue assumption that 
we impose on such a random design. We need some more definitions. 

Definition 1.3. Let Y be a random vector in W; Y is called isotropic if for every y £ W, E | {Y,y) p = 

1 1 2 

\\y\\2> o.nd is ■02 with a constant a if for every y G W, 

||(y,y)||^^:= inf{t :Eexp( Vt') <2} < a ||y||2 • (1-9) 

The important examples of isotropic, subgaussian vectors are the Gaussian random vector Y = {hi, . . . , hp) 
where hi,\/i are independent A^(0, 1) random variables, and the random vector Y = {ei, . . . ,ep) where 
Ej, Vi are independent, symmetric ±1 Bernoulli random variables. 

A subgaussian or ip2 operator is a random operator F : M*' — > M" of the form 

n 

r = ^(^'„-)e„ (1.10) 

i=l 

where ei, . . . , Cn are the canonical basis of M" and ^fi, . . . , are independent copies of an isotropic Tp2 
vector ^0 on Note that throughout this paper, F is represented by a random matrix whose rows are 
^'i, . . . , Throughout this paper, we consider a random design matrix X that is generated as follows: 

X := ^S^/^, where we assume Sjj = l,Vj = 1, ... ,j3, (l-H) 

and is a random matrix whose rows ^i, . . . , ^„ are independent copies of an isotropic 0^2 vector 'I'o on 
W as in Definition 1.3. For a random design X as in (1.11), we make the following assumption on S. 
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A slightly stronger condition has been originally defined in Zhou et al. (2009) in the context of Gaussian 
graphical modeling. 

Assumption 1.2. Restricted eigenvalue condition RE{s, kQ, S). Suppose T,jj = 1, Vj = 1, . . . ,p, and 

for some integer 1 < s < p and a positive number ko, the following condition holds, 

1 W^'^'vL 

(1.12) 



K{s,ko,T.) 



mm 

JoC{l,...,p}, 
\Jo\<S 



mm 



> 0. 



We note that similar to the case in Assumption 1 . 1 , it is sufficient to check if for t; / that is admissible 
to (1.12) and for K{s, ko,Tj) > 0, that the following inequality 



> 



> 



(1.13) 



2 K{s,ko,^) 

holds, where Tq con^esponds to locations of the s largest coefficients of v in absolute values. Formally, we 
have 

Proposition 1.4. Let 1 < s < p/2 be an integer and ko > 0. Suppose 5 ^ is admissible to (1.12), or 

equivalently to (1.6) , in the sense of Definition 1.1; then 

pTtX ^ ^oll^Tolli; (1.14) 

Hence (1.13) is both necessary and sufficient to guarantee that (1.12) holds. Similarly (1.7) is a necessary 
and sufficient condition for (1.6) fo hold. Moreover, suppose that S satisfies Assumption 1.2, then for 6 that 
is admissible to (1.12), we have 



We now define 



a/ Pmax("l) 



> 



11'^ Jo I 



K{s,ko,T,) 



mm 



I supp {t)\<m 



max 



> 0. 



(1.15) 
(1.16) 



I supp {t)\<m 

where we assume that \/fhaajJjn) is a constant for m < p/2. If RE{s, k^, S) is satisfied with ko > 1, then 
the squai^e submatrices of size < 2s of S are necessarily positive definite (see Bickel et al. (2009)); hence 
throughout this paper, we also assume that 

p„,in(2s) > 0. (1.17) 

Note that when ^ is a Gaussian random matrix with i.i.d. A^(0, 1) random variables, X as in (1.11) cor- 
responds to a random matrix with independent rows, such that each row is a random vector that follows a 
multivariate normal distribution A^(0, S): 

X has i.i.d. rows ~ A^(0, S), where we assume Sjj = 1, Vj = 1, . . . ,p. (1-18) 

Finally, we need the following notation. For a set y C W, we let conv V denote the convex hull of V. For 
a finite set Y, the cardinality is denoted by \Y\. Let B2 and S^^^ be the unit Euclidean ball and the unit 
sphere respectively. 
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1.2 The main theorem 



Throughout this section, we assume that T, satisfies (1.12) and (1.16) for m = s. We assume /sq > and it 
is understood to be the same quantity throughout our discussion. Let us define 



max 

(s), (1.19) 

where fco > is understood to be the same as in (1.20). Our main result in Theorem 1.6 roughly says 
that for a random matrix X := ^'S^/^, which is the product of a random subgaussian ensemble ^' and a 
fixed positive semi-definite matrix S^/^, the RE condition will be satisfied with overwhelming probability, 
given n that is sufficiently large (cf. (1.21)). Before introducing the theorem formally, we define the class 
of vectors Eg, for a particular integer 1 < s < p/2, that are relevant to the RE Assumption 1.1 and 1.2. For 
any given subset Jq C {1, . . . ,p} such that | Jo| < s, we consider the set of vectors 5 such that 

||'^J5lli<^oPjolli (1-20) 

holds for some fco > 0, subject to a normalization condition such that Y}/'^8 G S^^^; we then define the 
set E'g as unions of all vectors that satisfy the cone constraint as in (1.20) with respect to any index set 
Jo C {1, . . . ,p} such that | Jo| < s; 

K = \5: Y}/'^5 ^ = 1 s.t. 3 Jo C {1,... ,p} s.t. |Jo| < s and (1.20) holds} . 

We now define a even broader set: let (5^0 be the subvector of 5 confined to the locations of its s largest 
coefficients: 

Es = [5: lls^/^'^ll^ = 1 s.t. \\5t^X ^ H'^^olli holds,} 
Remark 1.5. It is clear from Proposition 1.4 that E'^ C Eg for the same > 0. 
Theorem 1.6 is the main contribution of this paper. 

Theorem 1.6. Set l<n<p, 0<9<1, and s < p/2. Let be an isotropic ip2 random vector on 
W with constant a as in Definition 1.3 and . . . , be independent copies of^o- Let ^ be a random 
matrix in M*^^^ whose rows are ^i, . . . , '^n- Let S satisfy (1.12) and (1.16). Ifn satisfies for C as defined 
m(1.19) 

n > —^max.(^C'^s\og{5ep/s),9logp) , (1.21) 
then with probability at least 1 — 2 exp(— c^^n/o^), we have for all 6 G E^, 

1-0 < II —111 < 1 + 61, and (1.22) 



ypi, i-e < < 1 + ^, (1.23) 



n 



where pi, . . . , pp are column vectors ofT}/"^, and c', c > are absolute constants. 
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We now state some immediate consequences of Theorem 1.6. Consider the random design X = ^S^/^ 
as defined in Theorem 1.6. It is clear when all columns of X have an Euclidean norm close to ^/n, as 
guaranteed by (1.23) for < < 1 that is small, it makes sense to discuss the RE condition in the form 
of (1.6). We now define the following event 7^ on a random design X, which provides an upper bound on 
K{s, ko, X) for a given fco > 0, when X satisfies Assumption RE{s, ko^X): 



n{e) := IX : RE{s, kg, X) holds with < K{s, ko,X) < 



K{s,ko,T,) 
1-9 



(1.24) 



Under Assumption 1.2, we consider the set of vectors u := S^/^(5, where 5 7^ is admissible to (1.12), and 
show a uniform bound on the concentration of each individual random variable of the form ||ru||2 := H-'^^ || 2 
around its mean. By Proposition 1.4, we have ||u||2 = ||S^/^(5||2 > 0. We can now apply (1.22) to each 
{5/ \\'E^^'^6\\^) / 0, which belongs to Eg and hence Es (see Remark 1.5), and conclude that 



0<{l-9) 



< 



n 



< (1 + 



(1.25) 



hold for all (5 7^ that is admissible to (1.12), with probability at least 1 — 2 exp(— c^^n/o^). Now the lower 
bound in (1.25) implies that 



ll^^ll. 



n 



>il-9) 



> (1 



K{s,kQ,T?) 



>0, 



(1.26) 



where Tq is the locations of largest coefficients of t in absolute values. Hence (1.23) and event 'R{9) hold 
simultaneously, with probability at least 1 — 2 exp(— c^^n/o^), given (1.13) and Proposition 1.4, so long as 
n satisfies (1.21). 

Remark 1.7. It is clear that this result generalizes the notion of restricted isometry property (RIP) intro- 
duced in Candes and Tao (2005). In particular, when S = / and 6 is s-sparse, (1.8) holds for X with 
9s = 9, given (1.25). 



2 Proof Theorem 1.6 

In this section, we first state a definition and then two lemmas in Section 2.1, from which we show the proof 
of Theorem 1.6 in Section 2.2. We shall identify the basis with the canonical basis {ei, 62, ... , e^} of W, 
where = {0, . . . , 0, 1, 0, . . . , 0}, and it is to be understood that 1 appears in the ith position and appears 
elsewhere. 

Definition 2.1. For a subset V C W, we let 



e^V) =Esup 



1=1 



(2.1) 



where t = (ti)i=i ^ '^^^ 9i^ - ■ ■ ^9p <^^^ independent N{0, 1) Gaussian random variables. 
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2.1 The complexity measures 

The subset T that is relevant to our result is a subset of the sphere S^"^ such that the linear function 
:Es^M.P maps 6 e Es onto: 



We now show a bound on functional of (T), for which we crucially exploit the cone property of vectors in 
Eg, the RE condition on S, and the bound of pma.x{s)- Lemma 2.2 is one of the main technical contributions 
of this paper. 

Lemma 2.2. (Complexity of a subset of S^'^) Let satisfy (1.12) and (I. \6). Let hi,... , hp be independent 
N{0, 1) random variables. Let 1 < s < p/2 be an integer Then 



where C is defined in (1.19) and c = 5e. 

Remark 2.3. We will also show in our fiindamental proof for the zero-mean Gaussian random ensemble 
with covariance matrix being S, where such complexity measure is used exactly in Section D. There we also 
give explicit constants. 

Now let T}/'^ := (pij) and pi, . . . , pp denote its p column vectors. By definition of S = (S^/^)^, it holds 
that II Pi II 2 = Yl^=i Pij = Sjj = 1, for alH = 1, . . . Thus we have the following. 

Lemma 2.4. Let ^ = {pi, . . . , pp} be the subset of vectors in SP~^ that correspond to columns o/S^/^. It 
holds that£^{^) < Si/log p. 

2.2 Proof of Theorem 1.6 

The key idea to prove Theorem 1 .6 is to apply the powerful Theorem 2.5 as shown in Mendelson et al. (2007, 
2008)(Corollary 2.7, Theorem 2.1 respectively) to the subset T of the sphere SP~^, as defined in (2.2). As 
explained in Mendelson et al. (2008), in the context of Theorem 2.5, the functional ^=i,(T) is the complexity 
measure of the set T, which measures the extent in which probabilistic bounds on the concentration of each 

1 1 2 

individual random variable of the form HTf II2 around its mean can be combined to form a bound that holds 
uniformly for all u G T. 

Theorem 2.5. (Mendelson et al., 2007, 2008) Set I < n < p and < 6 < 1. Let ^ be an isotropic ip2 
random vector on W with constant a, and ^1, . . . , ^„ be independent copies of^. Let F be as defined 
w (1.10) and let V C If n satisfies 



T := S 



/2(^^) ={y(z]^P:y = 5]i/2j for some S g Es}. 



(2.2) 




(2.3) 



(2.4) 



Then with probability at least 1 — exp{—c6'^n/a^), for all v £ V, we have 

i~e < WTvW^/Vn < 1 + 0, 
where c' ,c > are absolute constants. 



(2.5) 
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It is cleai- that (1.22) follows immediately from Theorem 2.5 by having V = T, given Lemma 2.2. In fact, 
we can now finish proving Theorem 1.6 by applying Theorem 2.5 twice, by having V = T and V = ^ 
respectively: the lower bound on n is obtained by applying the upper bounds on £^ (T) as given in Lemma 2.2 
and on as in Lemma 2.4. We then apply the union bound to bound the probability of the bad events 

when (2.5) does not hold for some f e T or some y G $ respectively. □ 



convergence for the Lasso and the Dantzig selector 



Throughout this section, we assume that < 9 < 1, and c',c > are absolute constants. Conditioned on 
the random design as in (1.1 1) satisfying properties as guaranteed in Theorem 1.6, we proceed to treat X as 
a deterministic design, for which both the RE condition as described in (1.24) and condition J^{6) defined 
as below hold, 

\X, 



HO) := {X ■.yj = i,...,p, i-e< 



< 1 + 



n 



(3.1) 



where Xi, . . . , Xp are the column vectors of X: Formally, we consider the set X B X of random designs 
that satisfy both condition 1^(6) and ^{0), for some < ^ < 1. By Theorem 1.6, we have for n satisfy the 
lower bound in (1.21), 

It is clear that on X, Assumption 1.2 holds for S. We now bound the con-elation between the noise and 
covariates of X for X ^ X, where we also define a constant \cj,a,p which is used throughout the rest of this 
paper. For each a > 0, for X G J-^{0), let 





X^e 








n 



< (1 + 0)A„,a,p, where X G J^(0), for < < U, 



where \a,a,p = o'V'l~+~aVv^^°S^')7"'' where a > 0; we have (cf. Proposition C.l) 



(3.2) 



(3.3) 



llx ■ I 

In fact, for such a bound to hold, we only need < 1 + ^, Vj to hold in J^{9). We note that constants 

in the theorems ai^e not optimized. 

Theorem 3.1. (Estimation for the Lasso) Set 1 < n < p, Q < 6 < \, and a > 0. Let s < p/2. Consider 
the linear model in (1.1) with random design X := ^'S^/^, where ^nxp i^ <^ subgaussian random matrix 
as defined in Theorem 1.6. and S satisfies (1.12) and (1.16). Let (3 be an optimal solution to the Lasso as 
in (1.2) with An > 2(1 + 9)Xa,a,p. Suppose that n satisfies for C as in (1.19), 



n > 



^2 



max (C^s log(5ep/s), 9 log . 



(3.4) 



Then with probability at least F {X n To) > 1-2 exp{-c0^n/a'^)-F (T^), we have for B < AK'^{s, 3, 



and h 







3, 



13-13 



< 2BXnVs, and 



p-(3 



< BXnS. 



(3.5) 
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Theorem 3.2. (Estimation for the Dantzig selector) Set I < n < p, < < 1, and a > 0. Let s < p/2. 
Consider the linear model in (1.1) with random design X := ^'S^/^, where ^'nxp ^ subgaussian random 
matrix as defined in Theorem 1.6. and S satisfies (1.12) and (1.16). Let P be an optimal solution to the 
Dantzig selector as in (1.3) where > (1 + 6)Xa,a,p- Suppose that n satisfies for C as in (1.19), 



n > 



^2 



max (C^s log(5ep/s), 9 log p) . 



(3.6) 



then with probability at least F [a: nTa) > l-2eyip{-cd^n/a'^)-F (T^), wehaveforB < 4K'^{s,l,J:)/{l- 
9)'^ and ko = 1, 



< 3BXn^/s, and 



(3-f3 



< 2B\nS. 



(3.7) 



Proofs are given in Section C. 
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A Some preliminary propositions 

In this section, we first prove Proposition 1.4, in Section A.l, which is used throughout the rest of the paper. 
We then present a simple decomposition for vectors 6 ^ Es and show some immediate implications, which 
we shall need in the proofs for Lemma 2.2, Theorem 3.1 and Theorem 3.2. 



A.l Proof of Proposition 1.4 



For each 6 that is admissible to (1.12), there exists a subset of indices Jq C {1, . . . ,p} such that both 
I Jo| < s and < ll<^Jolli hold. This immediately implies that (1.14) holds for fco > 0, 



\5t, 



olll 



II^Tolli < ||5|li - ll-Jjolli = ll'^J^IIi < ^"0 ll-^Jolli < ^'0 ll-^Tol 



due to the maximality of ||(5rp || among all ||^ for Jq C {1, . . . ,p} such that | Jo| < s. This immediately 
implies that E'^ C Eg. 

We now show that (1.13) is a necessary and sufficient condition for (1.12) to hold; the same argument 
applies to the RE conditions on X. Suppose (1.13) hold for (5 / 0; we have for all Jq € {1, . . . ,p} such 
that |Jo| < s and \\6j^\\^ < ko pjoHi, 



> 



PtoI 



> 



\\Sjo\ 



2 - K{s,ko,^) - K{s,ko,J:) 



>0, 



(A.l) 
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where the last inequality is due to the fact that H^jgHg > 0; Suppose ||5jq||2 = otherwise; then < 
^0 Pjolli = would imply that 6 = 0, which is a contradiction. Conversely, suppose that (1.12) hold; 
then (1.13) must also hold, given that Tq satisfies (1.14) with |To| = s, and 0. 

Finally, the "moreover" part holds given Assumption 1.2, in view of (A. 1). □ 



A.2 Decomposing a vector in Eg 

For each 6 E E's. we decompose 5 into a set of vectors 6to, 5ti> ^T2> ■ ■ ■ > such that Tq corresponds to 
locations of the s lai^gest coefficients of 6 in absolute values, Ti coiTcsponds to locations of the s lai^gest 
coefficients of 6t^ in absolute values, T2 con^esponds to locations of the next s lai^gest coefficients of St^ in 
absolute values, and so on. Hence we have Tq^ = (jf^^ Tfc, where K > l,\Tk\ = s, VA; = 1, . . . , K - 1, 
and \Tk\ < s. Now for each j > 1, we have 

where vector || • represents the largest entry in absolute value in the vector, and hence 

Y^WSnh < ^"'/'(II^toIIi + II<5tJIi + ||5t,|Ii + ...) 

k>l 

< .^-'/'(II^Tolli + ll'^T.HIi) = s-'/'\\6\\, (A.2) 

< S-^/\ko + l)\\6Toh<{ko + l)\\STo\\2, (A.3) 

where for (A.3), we have used the fact that for all 6 e Eg 

\\St§\\, < h\\6Toh (A.4) 
holds. Indeed, for 6 such that (A.4) holds, we have by (A.2) and (A.3) 

||<5||2 < ||5toII2 + EII'^^J2 ^ ¥n\\2 + s-'/'¥\\i (A.5) 

< (A;o + 2)||<5toII2- (A.6) 



A.3 On the equivalence of two RE conditions 

To introduce the second RE assumption by Bickel et al. (2009), we need some more notation. For an integer 
s such that 1 < s < p/2,a. vector v G W and a set of indices Jq C { 1 , . . . , p} with | Jq | < s, denoted by Ji 
the subset of {1, ... ,p} corresponding to the s largest in absolute value coordinates of v outside of Jq and 
defined Jqi = Jq U Ji. 

Assumption A.l. Restricted eigenvalue assumption RE{s, s, ko, X) (Bickel et al., 2009). Consider a 
fixed design. For some integer 1 < s < p/2, and a positive number ko, the following condition holds: 

1 \\Xv\L 
— — — := mm mm — > 0. (A.7) 

K[s,s,ko,X) joC{i,...,p} v^o, V"- IfJoi II2 
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Assumption A.2. Restricted eigenvalue assumption RE{s, s, ko, S) For some integer 1 < s < p/2, and 
a positive number k^, the following condition holds: 



mm 



mm 



K{s,s,ko,T^) ' JoQ{i,.:,p}, "7^0 



|Jo|<s 



PJ01II2 



> 0. 



(A.8) 



Proposition A.l. For some integer 1 < s < p/2, and for the same ko > 0, the two sets of RE conditions 
are equivalent up to a constant \/2 factor of each other: 



Similarly, we have 



K{s,s,ko,T,) 

K{s,s,ko,X) 
V2 



< K{s,ko,T.) < K{s,s,ko,T,); 



< K{s, ko, X) < K{s, s, ko,X). 



Proof. It is obvious that for the same ko > 0, (A.8) impUes that the condition as in Definition 1.2 holds 
with 

K{s, ko, S) < K{s, s, ko, S). 

Now, for the other direction, suppose that RE{s, /cq, S) holds for for K{s,ko,Ti) > 0. Then for all u / 
that is admissible to (1.12), we have by Proposition 1.4, 



> 



K{s,ko,Y.) 



> 0, 



(A.9) 



where Tq corresponds to locations of the s lai^gest coefficients of v in absolute values; Now for any Jq C 
{1, . . . ,p} such that I Jol < s, and Ht^jgH-i^ < ko ||t^Jolli holds, we have by(1.12), 



> 



> 0. 



2 - K{s,ko,^) 

Now it is clear that Ji C Tq U Ti, and we have for all w / that is admissible to (1.12), 

< l|t^Joill2 = ll^^oll2 + ll'^Jill2 



< ll't^Joll2 + II^Toll2 



< 2K'^{s,ko,J:) 



(A. 10) 



(A. 11) 
(A. 12) 

(A. 13) 



which immediately implies that for all t; / that is admissible to (1.12), 



> 



It^Joilla V2K{s,ko,J:) 



> 0. 



Thus we have that RE{s, s, ko, S) condition holds with K{s, s, ko, S) < V2K{s, ko, S). The other set of 
inequalities follow exactly the same line of arguments. □ 
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We now introduce the last assumption, for which we need some more notation. For integers s, m such that 
1 < s < p/2 and ■m>s,s + m<p,& vector 5 G M*' and a set of indices Jq C {1, . . . ,p} with \ Jq\ < s, 
denoted by Jm the subset of {1, ... ,p} corresponding to the m largest in absolute value coordinates of 6 



outside of Jq and defined Jo,n = JqU J„ 



Assumption A.3. Restricted eigenvalue assumption RE{s,'m,kQ, X) (Bickel et al., 2009). For some 
integer l<s<p/2, m>s,s + m<p, and a positive number ko, the following condition holds: 



1 ll^^^l 



.— min min ^ — '-^^-ir > 0. (A. 14) 

K{s,m,ko,X) joC{i,...,p} ^'5^0, V^W'^Jomh 

l-^ol^* |^Jg|i<'=o||'^jo||i 

Proposition A. 2. For some integer 1 < s < p/2, m > s,s + m < p, and some positive number /cq, we have 

K{s, m, ko,X) 



\/2 + /c, 
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< K{s, ko, X) < K{s, m, ko, X). 



Proof. It is clear that K{s, kQ,X) < K{s, m, kQ,X) for m > s. Now suppose that RE{s, ko,X) holds, we 
continue from (A. 10). We devide Jm into Ji, J2, . . ., such that such that Ji corresponds to locations of the 

s lai^gest coefficients of v jc in absolute values, J2 corresponds to locations of the next s largest coefficients 

II 1 1 2 

of vjc in absolute values, and so on. We first bound \\vjc^ \ \^, following essentially the same argument as 
in Candes and Tao (2007): observe that the A;th largest value of vjc obeys 

Thus we have for 6 that is admissible to (A. 14), 

II l|2 , II 1 /r 2 ^ -111 

P^5ill2 ^ Ir-^HIi 2^ <s \\vj^ 



2 ^ II ||2 , ,,2 ^ -1 II ||2 

111 

j>s+l 



^ -1 ; 2 II ||2 ^ T 2 II ||2 
< S fco II^^Jolll < ^0 ll^^Jolb- 



It is clear that HJilU < ll^olU'^i^d 



< IIUJ01II2 < lkjo™ll2 ^ lkjoll2 + II^Jill2 + |pJo"ill2 

< l|l^Joll2 + ll'^Jlll2 + ^0 lkjoll2 

< (1 + ||t'Joll2 + l|l^Toll2 

< {2^kl)K^{sM.X)\\Xvf^, 

which immediately implies that for all u / that is admissible to (A. 14), 

\\Xv\\^ , 1 



> , > 0. 



lkjomll2 ^2 + A;^if(s,A:o,X) 
Thus we have that RE{s, m, kQ,X) condition holds with K{s, m, ko,X) < y/2 + k^K^s, ko, X). □ 
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B Results on the complexity measures 



In this section, in preparation for proving Lemma 2.2 and Lemma 2.4, we first state some well-known 
definitions and some preliminary results on certain complexity measures on a set V (See Mendelson et al. 
(2008) for example); we also provide a new result in Lemma B.6. 

Definition B.l. Given a subset U cMP and a number e > 0, an e-net li of U with respect to the Euclidean 
metric is a subset of points ofU such that e-balls centered at IT covers U: 

x£Tl 

where A + B := {a + b : a G A,b & B} is the Minkowski sum of the sets A and B. The covering number 
M{U, e) is the smallest cardinality of an e-net ofU. 

Now it is well-known that there exists an absolute constant ci > such that for every finite subset IT C , 



4 (conv n) = 4 (n) < ci Viog|n|. (b . i ) 

The main goal of the rest of this section is to provide a bound on a variation of the complexity measure 
^*{V), which we will denote with throughout this paper, by essentially exploiting a bound similar 

to (B.l) (cf. Lemma B. 6). 

Given a set F C W, we need to also measure i^:{W), where W is the subspace of W such that the linear 
function J^^^"^ : V ^ MP caiTies t £V onto: 

W := T.^/^{V) = {w £RP :w = S^/^t for some t £ V}. 

We denote this new measure with £^:{V). Formally, 
Definition B,2, For a subset V C M^, we define 



i,{V) := £,{J:^/\V)) :=Esup {t,^^/'^h) 



Esup 



i=l 



(B.2) 



where t = (ti)?=i ^ ^'^^ ^ — {^iYi=i ^ random vector with independent A^(0, 1) random 

variables while g = Ti^f^h is a random vector with dependent Gaussian random variables. 

We prove a bound on this measure in Lemma B.6 after we present some existing results. The subsets that 
we would like to apply (2.1) and (B.2) are the sets consisting of sparse vectors: let S^~^ be the unit sphere 
in M*', for 1 < m < p 

Um. ■■= {x G SP-^ : I supp(x)| < m] (B.3) 
We shall also consider the analogous subset of the Euclidean ball, 

:= {x £ B^ supp(x)| < m} (B.4) 

The sets Um and Um are unions of the unit spheres, and unit balls, respectively, supported on m-dimensional 
coordinate subspaces of M^. The following three lemmas are well-known and mostly standai^d; See Mendelson et al. 
(2008) and Ledoux and Talagrand (1991) for example. 
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Lemma B.3. (Mendelson et al. (2008, Lemma2.2)) Given m > 1 and e > 0. There exists an e cover 11 C 
BJ^ of Bip with respect to the Euclidean metric such that Bip C (1 — e)^^ conv IT and < (1 + 2/e)'^. 
Similarly, there exists an e cover of the sphere S"^~^, H' C 5*™"^ such that < (1 + 2/e)'". 
Lemma B.4. (Mendelson et al. (2008, Lemma 2.3)) For every < e < 1/2 and every 1 < m < p, there is 
a setH C B2 which is an e cover of Um, such that 



Urn C 2 conv n, where |n| < 



2e 



(B.5) 



Moreover, there exists an e cover 11' C S*^ ^ ofUm with cardinality at most (^)"^ (^)- 



Proof. Consider all subsets T C {1, . . . ,p} with \T\ = m, it is clear that the required sets in H and H' 
in Lemma B .4 can be obtained by unions of corresponding sets supported on the coordinates from T. By 
Lemma B.3, the cardinaUties of these sets are at most (5/2e)™ (^) . □ 

Lemma B.5. (Ledoux and Talagrand, 1991) Let X = {Xi, . . . , Xj\[) be Gaussian in W. Then 



E max \Xi\ < 3\/logiV max a /EX? 

i=l,...,N ^ i=l,...,N 



We now prove the key lemma that we need for Lemma D.2. The main point of the proof follows the idea 
from Mendelson et al. (2008): if Um C 2 conv Um for 11^ C B2 and there is a reasonable control of the 
cardinality of Um and pma.xi'm) on E, then (V) is bounded from above. 

Lemma B.6. Let Um be a 1/2-cover ofUm provided by Lemma B.4. Then for \ < m < p/2 and c = 5e, it 
holds that for V = Um 



L{Um) < e^{2com Um) = 2t(Um) where 
?*(n.m) < 3 v^m log c{p/ m) Praa.^{m) . 



(B.6) 
(B.7) 



Proof. The first inequahty follows from the definition of and the fact that 

V = Um C Um C 2convnm. 
The second equality in (B.6) holds due to convexity which guarantees that 



sup 

j/gconv 



£*(2conv 
Thus we have for c = 5e 

(conv n„ 



21^, (conv Um 



sup 



4(n, 



E sup 

ten™ 



2^^ (flm) • 



and hence 



< 3Vlog|n„| sup jE\{t, Si/2/i ; 

ten„ 

< 3\/m log(5ep/m) sup S^'^^^t 

tenm. 



< 3 v^m log c{p/ m) Pmax ("^) 
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where we have used Lemma B.5, (B.5) and the bound (^) < (^)'", which is valid for m < p/2, and the 
fact that E I (t,sV2/i) \^ = E\ {h,^^/'^t) \^ = □ 



B.l Proof of Lemma 2.2 

It is clear that for all y G T, y = S^/^(5 for some 5 G Es, hence all equalities in (2.3) hold. We hence focus 
on bounding the last term. For each 5 G E'g, we decompose 5 into a set of vectors 5tq, 6ti, 5t2, ■ ■ ■ , ^Tk 
in Section A. 2. 

By Proposition 1.4, we have ||5rc ||_|^ < /cq 1 1(5^0 

For each index set T C {1, ... we let 5^ represent its 0-extended version 5' in W^, such that 5'j,a = 
and 5'rp = 6t- For 6t = 0, it is understood that 

h G MP, 



5x 



T||2 



below. Thus we have for all 6 in and all 



< 



< 



k>l 



k>l 



{STo,^'/'h)\+^\\6T, 

< 



fc 112 



k>l 



bll2 



¥tA\, 



k>l 



teUs 



k>l 



< {ko + 2)K{s,ko,T.) sup {h,T}''^t) 

t&Us 



(B.8) 



where we have used the following bounds in (B.9) and (B.IO): By Assumption 1.2 and by construction of 
its corresponding sets Tq, Ti, . . ., we have for all 5 ^ Eg, 



^J|2 < K(s,fco,E) 



k>l 



where we used the bound in (A.3). Thus we have by (B.8) and Lemma B.6 



E sup 

SeEs 



{h,Y}/^5) < {2 + ko)K{s,ko,^)Esup {h,Y}/^t) 

teUs 

< (2 + A;o)K(s,A;o,S)4(C/s) 



< 3(2 + ko)K{s,ko,J:)^/s\og{cp/s)^/p^ 
:= C^/s\og{cp/s) 



(B.9) 
(B.IO) 
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by Lemma B.6, where C is as defined in (1.19) and c = 5e. This proves Lemma 2.2. 



□ 



B.2 Proof of Lemma 2.4 



Let /ii, . . . , /ip be independent A'^(0, 1) Gaussian random variables. We have by Lemma (B.5), 



:= E max 

i=l,...,p 



< 30^ 



gp max 



3\/logp max 

i=l,...,p A 



= 3 viogp max = 30ogp, 

i=l,...,p 

where we used the fact that Sjj = 1 for all i and cr(/ij) = 1, Vj. 



□ 



C Proofs for Theorems in Section 3 

Throughout this section, let 1 > > 0. Proving both Theorem 3.1 and Theorem 3.2 involves first showing 
that the optimal solutions to both the Lasso and the Dantzig selector satisfy the cone constraint as in (1.4) for 
/ = supp (3, for some > 0. Indeed, it holds that ko = 1 for the Dantzig selector when A„ > (1 + 0)Xa-^a,p, 
and /cq = 3 for the Lasso when A„ > 2(1 + 6)Xa,a,p (cf. Lemma C.2 and (C.14)). These have been 
shown before, for example, in Bickel et al. (2009) and in Candes and Tao (2007). We included proofs for 
(Lemma C.2 and (C.14)) for completeness. We then state two propositions for the Lasso estimator and the 
Dantzig selector respectively under 7^, where a > and 1 > 9 > 0. We first bound the probability on T^. 

C.l Bounding T^^ 



Lemma C.l. For fixed design X with maxj ||Xj||2 < (1 + 0)y/n, where {) < 9 < 1, we have for Ta as 
defined in (3.2), where a > 0, P {T^) < (Vvr logpp")"^ 

Proof Define random variables: Yj = ^ ^['^j^ ejXj ,,-. Note that maxi<j<p | = ||X-^e/n||oo. We have 
E{Yj) = and Var((yj)) = \\Xj\\layn'^ < (1 + 9)a'^/n. Let cq = I + 9. Obviously, Yj has its tail 
probability dominated by that of Z A^(0, -^): 



\Y,\>t)<F{\Z\>t)<^^ew 
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We can now apply the union bound to obtain: 



( max |y,| > t 



, cocr 
< p^=- exp 



2c2cj2 



exp 



+ log 



\/2coo- 



logp 



By choosing t = co(TvT+~ay^21ogp7?^, the right-hand side is bounded by (^vrlogpp") ^ for a > 0. □ 
C.2 Proof of Theorem 3.1 

Let /3 be an optimal solution to the Lasso as in (1.2). S := supp (3. and 

v = ]5-(). 

We first show Lemma C.2; we then apply condition RE{s,ko,X) on v with ko = 3 under 7^ to show 
Proposition C.3. Theorem 3.1 follows immediately from Proposition (C.3). 

Lemma C.2. Bickel et al. (2009) Under condition Ta as defined in (3.2), ||fsc||-|^ < 3 ||i;5||^ /or A„ > 
2(1 + 0)X(j^a,pfor the Lasso. 



Proof. By the optimality of (3, we have 



An 



> 

1 ~ 2n 



1 ,,2 V^X'^^ 

> — Xvl 

2n n 



Hence under condition Ta as in (3.2), we have for A„ > 2(1 + 9)\fj^a,p, 

X^e 



\Xv\\2 /n < 2Xn 

< k(2 



2Xr. 



where by the triangle inequality, and Ps" = 0> we have 



+ 2 



+ \\v\ 



n 



1 ' 



< 2 



= 2||/?5||i-2 



2 lks<=|li + Ikslli + lilts': 



vsc 



II ■ 



Thus Lemma C.2 holds. 



(C.l) 



(C.2) 
□ 



We now show Proposition C.3, where except for the ^2 -convergence rate as in (C.5), all bounds have essen- 
tially been shown in Bickel et al. (2009) (as Theorem 7.2) under Assumption RE{s, 3, X); The bound on 
II f II 2, which as far as the author is awai^e of, is new; however, this result is indeed also implied by Theo- 
rem 7.2 in Bickel et al. (2009) given Proposition A. 1 as derived in this paper. We note that the same remark 
holds for Proposition C.5; see Bickel et al. (2009, Theorem 7.1). 



18 



Proposition C.3. (ip-loss for the Lasso) Suppose that RE{s, 3, X) holds. Let Y = XI3 + e, for e being 
i.i.d. N{0, a^) and ||-^j||2 < (1 + S)y/n. Let (3 be an optimal solution to (1.2) with A„ > 2(1 + 0)Xa,a,p> 
where a > 0. Let v = j3 — (3. Then on condition Ta as in (3.2), the following hold for Bq = 4K^{s, 3, X) 

W^sh < Bo\„,y/s, (C.3) 
ll^^lli < BoXnS, where 11^5^ || < 3 Hw^H]^ (C.4) 
and \\v\\2 < 2i?oAn\/s- (C.5) 

Proof. The first part of this proof follows that of Bickel et al. (2009). Now under condition Ta, by (C.l) 
and (C.2), 

||Xu||2 /n + A„ ||u||i < A„ (3 - + llusll^ + 

= 4.Xn\\vs\\^<4.XnVs\\vs\\2 (C.6) 

< 4:XnVsK{s,3,X)\\Xv\\2/Vn (C.7) 

< AK'^{s,3,X)Xls + \\Xv\\l/n. (C.8) 

where (C.7) holds by definition of RE{s, 3, X); Thus we have by (C.8) that 

\\vs\\i < \\v\\^ < AK^{s,2,,X)XnS, (C.9) 
which imphes that (C.4) holds with = 4K'^{s, 3, X). Now by RE{s, 3, X) and (C.6), we have 

\\vs\\l<K'^{s,2,,X)\\Xv\\l/n < i^^^^^ 3^ ;^)4^^^ ||y^||^ (CJO) 
which immediately implies that (C.3) holds. 

Finally, we have by (A.5), (C.9), (1.14) and the RE{s, 3, X) condition, 

\H\2 < II^^Toll2 + «"^^^ll'l^lll 

< K{s,3,X)\\Xv\\2/^/^ + AK^{s,3,X)XnVs, (C.ll) 

< K{s, 3, X)^AXn \\vs\\i + 4.K\s, 3, X)A„^, (C.12) 

< 8A„ir2(s,3,X)Vs. (C.13) 

where in (C.ll), we crucially exploit the universality of the RE condition; in (C.12), we use the bound 
in (C.6); and in (C.13), we use (C.9). □ 

C.3 Proof of Theorem 3.2 

Let (3 be an optimal solution to the Dantzig selector as into (1.3). Let S := supp (3. and 

v = ^-(3. 

We first show Lemma C.4; we then apply condition RE{s, kQ,X) to v with = \ under Ta to show 
Proposition C.5. Theorem 3.2 follows from immediately from Proposition (C.5). 
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Lemma C.4. (Candes and Tao (2007)) Under condition Ta, \\vs<:\\i < ||i;5||]^/or A„ > {l + 0)X„^a,p, where 
a > and < 6* < 1 for the Dantzig selector 



Proof. Clearly the true vector f3 is feasible to (1.3), as 



n 



X^{Y -XP) 







< 


oo 


n 


oo 



< (1 + 9)\a^a,p < A„, 



hence by the optimality of (3, 



< 



1 • 



Hence it holds under forv = (3 — P that 

ll/^lli- ll^slli + llt^sHli < + 

and hence v obeys the cone constraint as desired. 



< 



(C.14) 
□ 



Proposition C.5. (£p-loss for the Dantzig selector) Suppose that RE{s, 1, X) holds. Let Y = X(3+e,for e 
being i.i.d. N{0,a'^) and WXjW^ < {l + 9)^/n. Let f3 be an optimal solution to (1.3) with Xn > (l + ^)Ao-,a,p. 
where a > and < ^ < 1. Then on condition Ta as in (3.2), the following hold with Bi = 4i^^(s, 1,X) 



\\vs\\2 < BiXny/s, 

< 2i?iA„s, where \\vs<:\\i < ll^sll]^ 
and \\v\\2 < 3BiXny/s. 



(C.15) 
(C.16) 
(C.17) 



Remark C.6. See comments in front of Proposition C.3. 



Proof of Proposition C.5. Our proof follows that of Bickel et al. (2009). Let (3 as an optimal solution 
to (1.3). Let u = /3 - /? and let Ta hold for a > and < 6* < 1. By the constraint of (1.3), we have 

-x^cv -xm 

n 



1 7. 

-X^Xv 


< 


n 


00 



+ 




< 2A„ 


00 


n 


00 



and hence by Lemma C.4, we have 



|Xt;||2 /n 



.T 



X^Xi 



< 



n 



v'^X^X 



n 



kill < 2A„ ||t;||^ 



< 4A„|| 

i^slli ^ 4A„vs 11^^5112 ■ 
We now apply condition RE{s, kQ,X) onv with /cq = 1 to obtain 

1^5112 < ll^^^ll2A^ < K'^{s,l,X)4Xny/^\\vs\\2, 



(C.18) 



(C.19) 



which immediately imphes that (C.15) holds. Hence (C.16) holds with Bi = 4K'^{s, 1,X) given (C.19) 
and 



\vs4i < hsWi < 4:K^is,l,X)XnS. 



(C.20) 
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Finally, we have by (A.5), (C.20), (1.14) and the RE{s, 3, X) condition, 

\\v\\2 < WvToh + S'^^"^ \\v\\^ 

< K{s,l,X)\\Xv\\^/^/^ + 8K'^{s,l,X)XnVs, (C.21) 

< K{s,l,X)^4Xn\\vs\\^ + 8K\s,l,X)XnV^, (C.22) 

< l2XnK'^{s,l,X)^/^. (C.23) 

where in (C.21), we crucially exploit the universality of the RE condition, and in (C.22), we use the bound 

in (C.20) and (C.18); and in (C.23), we use (C.20) again. □ 



D A fundamental proof for the Gaussian random design 



In this section, we state a theorem for the Gaussian random design, following a more fundamental proof 
given by Raskutti et al. (2009) (cf. Proposition 1). We apply their method and provide a tighter bound 
on the sample size that is required in order for X to satisfy the RE condition, where X is composed of 
independent rows with multivariate Gaussian vectors drawn from A^(0, S) as in 1.18. We note that both 
upper and lower bounds in Theorem D. 1 are obtained in a way that is quite similar to how the largest and 
smallest singular values of a Gaussian random matrix are upper and lower bounded respectively; see for 
example Davidson and Szarek (2001). The improvement over results in Raskutti et al. (2009) comes from 
the tighter bound on ^*(T) as developed in Lemma 2.2. Formally, we have the following. 
Theorem D.l. Set 1 < n < p and < 6 < 1. Consider a random design X as /?i(l.ll), where S 
satisfies (1.12) and (1.16). Suppose s < p/2 and for C as in (1.19), 



n 



> ^ [c^J s\og{bep/s) + \/2dlogp 



for d > 0. Then we have with probability at least 1 — A/p'^ 
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oil)) 



< \\X5\\JV^ < (1 + 



(D.l) 



(D.2) 



holds for all 5 ^ Q that is admissible to (1.12), that is, 3 some Jq G {1, . . . ,p} such that \Jo\ < s and 
pJsWi < ^0 Pjolli- where ko > 0. 



Proof. We only provide a sketch here; see Raskutti et al. (2009) for details. Using the Slepian's Lemma and 
its extension by Gordon (1985), the following inequalities have been derived by Raskutti et al. (2009) (cf. 
Proof of Proposition 1 therein), 

E inf ||X(5||2 > E||c/||2 -E sup {h,T.^/'^5] 
Esup||X(5||2 < Vn + Esup {h,^^/^6) 

5£Es S&Es 
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where g and h are random vectors with i.i.d Gaussian A^(0, 1) elements in M" and W respectively. Now 
Lemma D.2 follows immediately, after we plug in the bound as in Lemma 2.2 on 

4(T):=Esup {h,Y}/'^5) 
Lemma D.2. Suppose S satisfies Assumption 1.2. Then for C as in Theorem D.l, we have 



E inf ||X5||2 > - o{VE) - C^s log(5ep/s) 



E sup \\Xb\\^ < 



+ C^/slog{5ep/s). 



(D.3) 
(D.4) 



We then apply the concentration of measure inequality for inf^g^;^ ||X(5||2, for which it is well known that 
the 1-Lipschitz condition holds for inf^g^;^ Il^<^ll2 = ^^^SeEs ||^^^^^'^||2 ' where A is a matrix with i.i.d. 
standard normal random variables in IR"^^. Recall a function f : X ^ Y is called 1-Lipschitz condition if 
for all x,y e X, 

dY{f{x),f{y))<dx{x,y). 



Proposition D.3. View Gaussian random matrix A as a canonical Gaussian vector in IR"^. Let f{A) := 
inf5g£;^ ||ylSi/25||2 and f'{A) := sup^g^;^ HASI/^JH^ be two functions of A from W"p to R. Then f,f : 
]R"P R are 1-Lipschitz: 

\f(A)-fm < \\A-B\\^ < \\A-B\\p, 
\f'{A)-nB)\ < \\A-B\\^ < \\A-B\\^. 



Finally we apply the concentration of measure in Gauss Space to obtain for t > 0, 

¥{\f{A) -Ef{A)\>t) < 2exp(-tV2), and 
F{\f'{A)-Ef'{A)\>t) < 2exp(-tV2). 

Now it is clear that with probability at least 1 — A/p'^, where c? > 0, we have for X = AS 1/2 



inf 

S&E^ 



= : f(A) > E inf 
2 V ; - ^^^^ 



A^^/^5 



^/2dlo: 



> \/n — o{\/n) - C-y/slog(5ep/s) - y^2dlogp, 



which we denote as event JT, and 



sup 

S&Es 



--: f'{A) < Esup 

SeEs 



+ ^/2dlogp 



< ^/n + C'\/slog{5ep/s) + ^/2dlogp, 
which we denote as JF'. Now it is clear that (D.2) holds on n JF', given (D.l). 



(D.5) 
(D.6) 



□ 
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